Goto

Collaborating Authors

 valuable data


InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional Materials

Han, Xiao-Qi, Guo, Peng-Jie, Gao, Ze-Feng, Sun, Hao, Lu, Zhong-Yi

arXiv.org Artificial Intelligence

Developing inverse design methods for functional materials with specific properties is critical to advancing fields like renewable energy, catalysis, energy storage, and carbon capture. Generative models based on diffusion principles can directly produce new materials that meet performance constraints, thereby significantly accelerating the material design process. However, existing methods for generating and predicting crystal structures often remain limited by low success rates. In this work, we propose a novel inverse material design generative framework called InvDesFlow-AL, which is based on active learning strategies. This framework can iteratively optimize the material generation process to gradually guide it towards desired performance characteristics. In terms of crystal structure prediction, the InvDesFlow-AL model achieves an RMSE of 0.0423 Å, representing an 32.96% improvement in performance compared to exsisting generative models. Additionally, InvDesFlow-AL has been successfully validated in the design of low-formation-energy and low-Ehull materials. It can systematically generate materials with progressively lower formation energies while continuously expanding the exploration across diverse chemical spaces. These results fully demonstrate the effectiveness of the proposed active learning-driven generative model in accelerating material discovery and inverse design. To further prove the effectiveness of this method, we took the search for BCS superconductors under ambient pressure as an example explored by InvDesFlow-AL. As a result, we successfully identified Li\(_2\)AuH\(_6\) as a conventional BCS superconductor with an ultra-high transition temperature of 140 K. This discovery provides strong empirical support for the application of inverse design in materials science.


What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

Choe, Sang Keun, Ahn, Hwijeen, Bae, Juhan, Zhao, Kewen, Kang, Minsoo, Chung, Youngseog, Pratapa, Adithya, Neiswanger, Willie, Strubell, Emma, Mitamura, Teruko, Schneider, Jeff, Hovy, Eduard, Grosse, Roger, Xing, Eric

arXiv.org Artificial Intelligence

Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast training datasets has been largely limited by prohibitive compute and memory costs. In this work, we focus on influence functions, a popular gradient-based data valuation method, and significantly improve its scalability with an efficient gradient projection strategy called LoGra that leverages the gradient structure in backpropagation. We then provide a theoretical motivation of gradient projection approaches to influence functions to promote trust in the data valuation process. Lastly, we lower the barrier to implementing data valuation systems by introducing LogIX, a software package that can transform existing training code into data valuation code with minimal effort. In our data valuation experiments, LoGra achieves competitive accuracy against more expensive baselines while showing up to 6,500x improvement in throughput and 5x reduction in GPU memory usage when applied to Llama3-8B-Instruct and the 1B-token dataset.


Protecting Life-Saving Medical Devices from Cyberattack

Communications of the ACM

Smart medical gadgets are crucial for keeping people alive and healthy. From wearables that keep an eye on your heart rate all day to heart pumps and big machines such as ventilators and dialysis units, these devices often work non-stop. However, the sad reality is that cyber-security is not always top of mind when these devices are being created. Many are easily connected to the Internet, often have simple passwords, or sometimes do not even require passwords. This lack of security is a huge problem because it allows hackers to not only break into the devices themselves, but also to penetrate hospital systems and wreak havoc with harmful software.


Top 5 Web Analytics Tools for Improving Site Performance

#artificialintelligence

Behind every successful website, there is a comprehensive analytics tool that directs and informs all its strategies and updates. Web analytics tools are the most reliable indicators of whether a website is working to its full potential and how it fares compared to the competition. Advanced web analytics software can make or break your operations when running any online business. However, with the abundance of tools available today, choosing one that provides the highest quality data can be tricky. Read on for an overview of the most valuable and accurate web analytics tools you should consider for your organization.


The Essential Components of Digital Transformation

#artificialintelligence

The digital revolution forced every organization to reinvent itself, or at least rethink how it goes about doing business. Most large companies have invested substantial cash in what is generally labelled "digital transformation." While those investments are projected to top $6.8 trillion by 2023, they're often made without seeing clear benefits or ROI. Although these failures have multiple causes, they are generally the result of underestimating the various steps or stages required to successfully execute a transformation agenda. For example, common errors include the naïve assumption that by simply buying technology -- or investing in any of the fancy tools or shiny new objects of the booming tech market -- organizations will somehow transform.


How to Win New Business with External Data

#artificialintelligence

Increasingly, external data (alternative data, public data, open data – call it what you want) is being called the "secret sauce" of driving advanced analytics, developing machine learning and AI capabilities, enriching existing models, and delivering unrealized insights to every part of your organization. The difficulty in connecting to this data is top of mind for many businesses, and the issue of governing its use is crystallizing into a core strategy among organizations that have seen the GDPR's writing on the wall. For the moment, put aside the slightly fevered predictions about delivering artificial intelligence across the enterprise with a few datasets and a new CDO (setting up an external data strategy relies as much on culture as it does on data). It's important to acknowledge that for many organizations, external data should be approached not as a replacement for what's already working, but as an enhancement to what you've got today. Regardless of where an organization is on their journey towards being data-driven, the chances are good that data, in some form or another, is already hardwired into their architecture.


Council Post: How To Manage Your Data Intelligently In The Cloud Era

#artificialintelligence

Luke Han is a Co-Founder and CEO of Kyligence, as well as co-founder and Project Management Committee member for Apache Kylin. The era of cloud data has changed drastically. Data that was once concentrated in one location or a central repository is now distributed, and with data spread out, individuals can access it anytime, anywhere. While accessing data has become easier than ever, however, it has also created chaos when it comes to using that data. Disparate data storage has scattered data almost to the point of being unusable.


AI and ML for Personal Customer Experiences (CX)

#artificialintelligence

In 2017, the Economist stated that the world's most valuable resource is no longer oil, but data. Four years later, this concept is only increasing in truth. Thanks to the revolutionary promises of 5G, artificial intelligence (AI) and machine learning (ML) possibilities are transforming the value of the data collected on consumers and our habits every single day. With 5G usage predicted to explode in coming years with over 1 billion 5G connections by 2023, the possibilities of AI and ML solutions are seemingly becoming limitless. Gone are the days when your mobile phone or laptop are the only devices collecting your data.


3 Top Artificial Intelligence Stocks to Buy in September

#artificialintelligence

Research firm IDC estimates that global revenue from artificial-intelligence hardware, software, and services will climb roughly 12% annually to reach $156.5 billion this year, and spending is expected to accelerate as coronavirus-related pressures ease. The firm projects a compound annual growth rate of roughly 17.1% for the category across the five-year period spanning from 2020 through 2024. Artificial intelligence (AI) has the potential to influence almost every industry, and the overall economic impact of the tech will exceed what's spent on hardware, software, and services many times over. The great news for investors is that the artificial intelligence revolution is still in its early innings, and the beneficial impact that it will have for industry leaders is still underappreciated. With that in mind, here's why Appian (NASDAQ: APPN), Xilinx (NASDAQ: XLNX), and Facebook (NASDAQ: FB) are three top AI stocks to buy this month.


Machine Learning Engineer - Customer Engagement

#artificialintelligence

Because you belong at Twilio. Twilio seeks a Machine Learning Engineer to be a key leader in defining a new product offering at Twilio in the customer engagement space. The person in this role will be critical in shaping Twilio's data and intelligence strategy, which will empower our customers to create highly personalized communications and experiences for their contacts. Come be part of a team that's building a set of ML-driven APIs that deliver intelligent audience and personalization recommendations. Increasingly, we're hearing from our B2C customers that they're struggling to harness the massive amounts of valuable data they generate, much of which stems from the communications we help them send.